Search CORE

160 research outputs found

The importance of better models in stochastic optimization

Author: Asi Hilal
Duchi John C.
Publication venue
Publication date: 20/03/2019
Field of study

Standard stochastic optimization methods are brittle, sensitive to stepsize choices and other algorithmic parameters, and they exhibit instability outside of well-behaved families of objectives. To address these challenges, we investigate models for stochastic minimization and learning problems that exhibit better robustness to problem families and algorithmic parameters. With appropriately accurate models---which we call the aProx family---stochastic methods can be made stable, provably convergent and asymptotically optimal; even modeling that the objective is nonnegative is sufficient for this stability. We extend these results beyond convexity to weakly convex objectives, which include compositions of convex losses with smooth functions common in modern machine learning applications. We highlight the importance of robustness and accurate modeling with a careful experimental evaluation of convergence time and algorithm sensitivity

arXiv.org e-Print Archive

Mean Estimation from Adaptive One-bit Measurements

Author: Duchi John C.
Kipnis Alon
Publication venue
Publication date: 10/10/2017
Field of study

We consider the problem of estimating the mean of a normal distribution under the following constraint: the estimator can access only a single bit from each sample from this distribution. We study the squared error risk in this estimation as a function of the number of samples and one-bit measurements

n

. We consider an adaptive estimation setting where the single-bit sent at step

n

is a function of both the new sample and the previous

n-1

acquired bits. For this setting, we show that no estimator can attain asymptotic mean squared error smaller than

\pi/(2n)+O(n^{-2})

times the variance. In other words, one-bit restriction increases the number of samples required for a prescribed accuracy of estimation by a factor of at least

\pi/2

compared to the unrestricted case. In addition, we provide an explicit estimator that attains this asymptotic error, showing that, rather surprisingly, only

\pi/2

times more samples are required in order to attain estimation performance equivalent to the unrestricted case

arXiv.org e-Print Archive

Crossref

Mean Estimation from One-Bit Measurements

Author: Duchi John C.
Kipnis Alon
Publication venue
Publication date: 06/01/2021
Field of study

We consider the problem of estimating the mean of a symmetric log-concave distribution under the constraint that only a single bit per sample from this distribution is available to the estimator. We study the mean squared error as a function of the sample size (and hence the number of bits). We consider three settings: first, a centralized setting, where an encoder may release

n

bits given a sample of size

n

, and for which there is no asymptotic penalty for quantization; second, an adaptive setting in which each bit is a function of the current observation and previously recorded bits, where we show that the optimal relative efficiency compared to the sample mean is precisely the efficiency of the median; lastly, we show that in a distributed setting where each bit is only a function of a local sample, no estimator can achieve optimal efficiency uniformly over the parameter space. We additionally complement our results in the adaptive setting by showing that \emph{one} round of adaptivity is sufficient to achieve optimal mean-square error

arXiv.org e-Print Archive

Distributed Delayed Stochastic Optimization

Author: Agarwal Alekh
Duchi John C.
Publication venue
Publication date: 01/01/2011
Field of study

We analyze the convergence of gradient-based optimization algorithms that base their updates on delayed stochastic gradient information. The main application of our results is to the development of gradient-based distributed optimization algorithms where a master node performs parameter updates while worker nodes compute stochastic gradients based on local information in parallel, which may give rise to delays due to asynchrony. We take motivation from statistical problems where the size of the data is so large that it cannot fit on one computer; with the advent of huge datasets in biology, astronomy, and the internet, such problems are now common. Our main contribution is to show that for smooth stochastic problems, the delays are asymptotically negligible and we can achieve order-optimal convergence results. In application to distributed optimization, we develop procedures that overcome communication bottlenecks and synchronization requirements. We show

n

-node architectures whose optimization error in stochastic problems---in spite of asynchronous delays---scales asymptotically as \order(1 / \sqrt{nT}) after

T

iterations. This rate is known to be optimal for a distributed system with

n

nodes even in the absence of delays. We additionally complement our theoretical results with numerical experiments on a statistical machine learning task.Comment: 27 pages, 4 figure

arXiv.org e-Print Archive

CiteSeerX